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Shor's algorithm, which given appropriate hardware can factorise an integer A'^ in a time polyno- 
mial in its binary length L, has arguable spurred the race to build a practical quantum computer. 
Several different quantum circuits implementing Shor's algorithm have been designed, but each 
tacitly assumes that arbitrary pairs of qubits within the computer can be interacted. While some 
quantum computer architectures possess this property, many promising proposals are best suited to 
realising a single line of qubits with nearest neighbour interactions only. In light of this, we present 
a circuit implementing Shor's factorisation algorithm designed for such a linear nearest neighbour 
architecture. Despite the interaction restrictions, the circuit requires just 2L + 4 qubits and to 
first order requires SL* gates arranged in a circuit of depth 321/^ — identical to first order to that 
possible using an architecture that can interact arbitrary pairs of qubits. 



I. INTRODUCTION 

A quantum computer is a device that manipulates a 
collection of small, interacting quantum systems. Usu- 
ally each quantum system contains just two states |0) 
and |1), and is called a qubit. Unlike the bits in clas- 
sical computers, qubits can be placed in arbitrary su- 
perpositions ajO) -I- (3\1) and entangled with one another 
(1 00) + |ll))/\/2. These two properties have enabled 
quantum algorithms to be devised that are exponentially 
faster than their classical equivalents P, |2| ■ 

Building a practical device to implement quantum al- 
gorithms is a daunting task. When devising a quantum 
algorithm it is frequently assumed that any pair of qubits 
in the computer can be interacted. However, many physi- 
cal quantum computer proposals utilise short range forces 
to couple qubits and hence only permit nearest neighbour 
interactions^ lilllEllllla|l3IIl|ll|ll|ll[il|ll 
Ir^ ITsL l20| . Indeed, each of the cited proposals rec- 
ommends that just a single line of qubits be built. Deter- 
mining whether these linear nearest neighbour (LNN) ar- 
chitectures can implement quantum algorithms in a prac- 
tical manner is a nontrivial and important question. 

Implementing Shor's factorisation algorithm 1] is ar- 
guably the focus of much experimental quantum com- 
puter research due to its encryption breaking applica- 
tions. A necessary test of any architecture is therefore 
whether or not it can implement Shor's algorithm effi- 
ciently. In light of this, we present an LNN circuit im- 
plementing Shor's algorithm requiring just 2L-I-4 qubits 
and + mi'^ + 116^^2 + 41^ _ 2 gates arranged in 
a circuit of depth 32L3 -|- SOi^ _ 4i _ 2. The depth of 
a circuit is the number of layers of gates in it, where a 
layer of gates is a set of gates implemented in parallel. 
The circuit presented in this paper is based on the Beau- 
regard circuit 21.]. which is designed for an architecture 
that can interact arbitrary pairs of qubits. To first order 
the Beauregard circuit also has a gate count of SL"* and 
circuit depth of 32L'^, provided one adds an additional 
qubit to the circuit to allow repeated Toffoli gates to be 
implemented more quickly. The precise differences are 



detailed throughout the paper. 

The paper is structured as follows. In sectionllTIShor's 
algorithm is briefly reviewed. In section UTTl Shor's algo- 
rithm is broken into a series of simple tasks. Section llVI 
contains a brief description of the canonical decomposi- 
tion which is used to build fast 2-qubit gates. Sections IVl 
to IIXI present, in order of increasing complexity, the LNN 
quantum circuits that together comprise the LNN Shor 
quantum circuit. The LNN quantum Fourier transform 
(QFT) is presented first, followed by a modular addition, 
the controlled swap, modular multiplication, and finally 
the complete circuit. Section ^ concludes with a sum- 
mary of all results, and a description of further work. 



II. SHOR'S ALGORITHM 

Shor's algorithm Q was pubhshed in 1994 and greeted 
with great excitement due to its pot ential to break the 
popular RSA encryption protocol [23| . RSA is used in all 
aspects of e-commerce from Internet banking to secure 
online payment and can also be used to facilitate secure 
message transmission. The security of RSA is conditional 
on large integers being difficult to factorise which has so 
far proven to be the case when using classical computers. 
Shor's algorithm renders this problem tractable. 

To be specific, let N = N1N2 be a product of prime 
numbers. Let L — h\2 N be the binary length of N . 
Given N , Shor's algorithm enables the determination of 
A^i and N2 in a time polynomial in L. This is achieved 
indirectly by finding the period r of /(fc) — mod A'^, 
where to is a randomly selected integer less than and 
coprime to A^. A classical computer can then use N, m, 
and r to determine Ai and N2. 

Quantum circuits implementing Shor's algorithm can 
be designed for conceptual simp licity 23], speed [23 ]. 
minimum number of qubits | 21l or a tradeoff between 
speed and number of qubits 25]. Table ^summarises the 
various qubit counts and gate depths. Note that gen- 
erally speaking time can be saved at the cost of more 
qubits. 
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Qubits [21j 


~ 2L 


~ 32L^ 


TradeofT [25j 


~ SOL 


~ 2i^L2 



TABLE I: Number of qubits required and circuit depth of dif- 
ferent implementations of Shor's algorithm. Where possible, 
figures are accurate to first order in L. 



An underlying procedure common to all implementa- 
tions does exist. The first common step involves ini- 
tializing the quantum computer to a single pure state 
|0)2l|0)l- Note that for clarity the computer state has 
been broken into a 2L qubit k register and an L qubit 
/ register. The meaning of this will become clearer be- 
low. The various optimisations used to achieve the qubit 
count of 2L + 4 discussed in this paper will be presented 
in section Hxl 

Step two is to Hadamard transform each qubit in the 
k register yielding 



If r divides 2^^ Eq. © can be evaluated exactly. The 
probability of observing j ~ c2^^/r for some integer < 
c < r is l/r whereas if j ^ c2^^/r the probability is 
0. When r is not a power of 2 all one can say is that 
with high probability the value j measured will satisfy 
j ~ c2^^/r for some < c < r. In either case, given a 
measurement j ~ c2^^/r with c ^ 0, information about r 
can be extracted via s imple classical manipulations 2^. 
Note that is quite likely that r will not be completely 
determined after just one run of the steps described above 
and that further runs will be required. Even when the 
final value of r is determined, if r is not even or r does 
not satisfy f{r/2) ^ — 1 the entire process needs to 
be repeated for a different value of m in an attempt to 
get a different value of r. When a value of r is found 
with the appropriate properties, the factors of A^ can be 
determined from A^i = gcd(/(r/2) -f l^N) and A^2 = 
gcd(/(r/2)-l,Ar). 



III. DECOMPOSING SHOR'S ALGORITHM 
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(1) 



Step three is to calculate and store the corresponding 
values of f{k) in the / register 



- y 

fc=0 



|fc>2L|/(fc))i 



(2) 



Note that this step requires additional ancilla qubits. 
The exact number depends heavily on the precise details 
of the implementation. 

Step four can actually be omitted but it explicitly 
shows the origin of the period r being sought. Measuring 
the / register yields 



51 \ko + nr)2L\h)L 



(3) 



where /o is the measured value and fco is the smallest 
value of k such that f{k) = /q. 

Step five is to apply the quantum Fourier transform 



1^)^^ E exp(27rzjfc/22^)|j) 



(4) 



to the k register resulting in 



p=0 



22L 



exp(27rzj(fco +pO/2''^)Ij>2l|/o)l. (5) 



The probability of measuring a given value of j is thus 
22T E exp(27rij>r/22^) 

p=0 



Pr(j, r,L) 



(6) 



The purpose of this section is to break Shor's algorithm 
into a series of steps that can be easily implemented as 
quantum circuits. Neglecting the classical computations 
and optional measurement step described in the previous 
section, Shor's algorithm has already been broken into 
four steps. 

1. Hadamard transform. 

2. Modular exponentiation. 

3. Quantum Fourier transform. 

4. Measurement. 

The modular exponentiation step is the only one that 
requires further decomposition. 

The calculation of /(fc) = m*^ mod A^ is firstly broken 
up into a series of controlled modular multiplications. 



2L-1 



f{k) = n ( 



TO^''^' mod A^), 



(7) 



where ki denotes the ith bit oi k. If /c^ = 1 the multipli- 
cation mod N occurs, and if fc^ = nothing happens. 

There are many different ways to implement controlled 
modular multiplication [H iH HI Eg . The methods of 
[2ll | require the fewest qubits and will be used in this 
paper. To illustrate how each controlled modular multi- 
plication proceeds, let a{i) ~ rr? mod A^ and 



x[i) = n( 



rti^'^' mod A^). 



(8) 



x{i) represents a partially completed modular exponenti- 
ation and a(i) the next term to multiply by. Let 0) 
denote a quantum register containing x(i) and another 
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of equal size containing 0. Firstly, add a(i) modularly 
multiplied by the first register to second register if and 
only if (iff) h^l. 

\x{i),0) ^ \x{i),0 + a{i)x{i)modN) 

= \xii),x{i + l)). (9) 



a.) 



Secondly, swap the registers iff fc^ = 1. 

\x{i),x{i + 1)> ^ \x{i + l),x{i)) 



(10) 



Thirdly, subtract a{i) ^ modularly multiplied by the first 
register from the second register iff /c^ = 1. 

\x{i + l),x{i)) 

\x{i + 1), x{i) - a{iy^x{i + 1) mod N) 
= \xii + l),0). (11) 

Note that while nothing happens if fc^ = 0, by the def- 
inition of x{i) the final state in this case will still be 
|x(z + l),0). 

The first and third steps described in the previous 
paragraph are further broken up into series of controlled 
modular additions and subtractions respectively. 

L-l 

+ a{i)x{i) = + ^ a(i)2^x(i)j mod N, (12) 
j=o 

x{i) — a{i)~^x{i + 1) = 

L-l 

x{i) - ^ a{i)-^2^x{i + 1)^ mod N, (13) 
j=o 

where x{i)j and x{i + l)j denote the jth bit of x{i) and 
x{i + 1) respectively. Note that the additions associated 
with a given x{i)j can only occur if x{i)j — 1 and simi- 
larly for the subtractions. Given that these additions and 
subtractions form a multiplication that is conditional on 
ki, it is also necessary that ki — \. 

Further decomposition will be left for subsequent sec- 
tions. 
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FIG. 1: a.) Swap gate expressed as a sequence of physical 
operations via the canonical decomposition, b.) Similarly 
decomposed compound gate consisting of a Hadamard gate, 
controlled phase rotation, and swap gate. Note that the Kane 
architecture 30] has been used for illustrative purposes. 



be used to create an arbitrary Ud up to single-qubit ro- 
tations l23|. 

Fig. shows the form of a swap gate decomposed 
into a series of physical operations via the canonical de- 
composition [22]. The Kane architecture i3fl(] has been 
used for illustrative purposes. Note that the full circles 
in the figure represent Z-rotations of angle dependent on 
the physical construction of the computer. Fig.^ shows 
an implementation of the composite gate Hadamard fol- 
lowed by a controlled n/2 phase rotation followed by a 
swap. Note that the total time of the compound gate is 
almost the same as the swap gate on its own. In certain 
cases, the total time of the compound gate can be much 
less than the time required to implement any one of its 
constituent gates |3lj . In this paper, every sequence of 1- 
and 2-qubit gates that are applied to the same two qubits 
has been implemented as a canonically decomposed com- 
pound gate. 



IV. CANONICAL DECOMPOSITION 

A crucial part of building efficient quantum circuits is 
building efficient 2-qubit gates. This is particularly true 
for LNN circuits in which it is common to follow produc- 
tive gates such as controUed-NOT (CNOT) or controUed- 
phase (cphase) with swap gates designed to bring other 
pairs of qubits together to be interacted. Such pairs of 
2-qubit gates can and should be combined into a single 
compound gate. 

The canonical decomposition enables any 2-qubit op- 
erator Uab to be expressed (non-uniquely) in the form 
V\ ® VqUclUa ®Ub where Ua, Ub,Va and Vb are single- 
qubit unitaries and Ud = exp[i{axX X + ayY + 
UzZi^Z)] Moreover, any entangling interaction can 



V. QUANTUM FOURIER TRANSFORM 

The first circuit that needs to be described, as it will 
be used in all subsequent circuits, is the QFT. 

2^-1 

1^) ^ 7^ E cxp(2^*jA;/2^)|j) (14) 

Fig. |3i shows the usual circuit design for an architec- 
ture that can interact arbitrary pairs of qubits. Fig. |5Jd 
shows the same circuit rearranged with the aid of swap 
gates to allow it to be implemented on an LNN archi- 
tecture. Dashed boxes indicate compound gates. Note 
that the general circuit inverts the most significant to 
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FIG. 2: a.) Standard quantum Fourier transform circuit, b.' 
An equivalent linear nearest neighbour circuit. 



least significant ordering of the qubits whereas the LNN 
circuit does not. 

Counting compound gates as one, the total number of 
gates required to implement a QFT on both the general 
and LNN architectures is L(L— 1)/2. Assuming gates can 
be implemented in parallel, the minimum circuit depth 
for both is 2L — 3. 
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FIG. 3: a.) Quantum Fourier addition, b.) Controlled quan- 
tum Fourier addition and its symbolic equivalent circuit. 



VI. MODULAR ADDITION 

Given a quantum register containing an arbitrary su- 
perposition of binary numbers, there is a particularly 
easy way to add a binary number to each number in the 
superposition "21] . By quantum Fourier transforming the 
superposition, the addition can be performed simply by 
applying appropriate single-qubit rotations as shown in 
fig. Such an addition can also very easily be made 
dependant on a single control qubit as shown in fig. 

Performing controlled modular addition is consider- 
ably more complicated as shown in fig. 0] This circuit 
adds 2^m^ mod N to the register containing to ob- 
tain 0(c) = -I- 2^TO^ mod N) iff both x{i)j and ki 
are 1. The first five gates comprise a Toffoli gate that 
sets kx — I iS x{i)j = ki = 1. ki and x{i)j are defined in 
eq. (|7|) and eqs. (|12I13() respectively. Note that the Beau- 
regard circuit does not have a kx qubit, but without it 
the singly-controlled Fourier additions become doubly- 
controlled and take four times as long. The calculations 
of the gate count and circuit depth of the Beauregard 
circuit presented in this paper have therefore been done 
with a kx qubit included. 

The next circuit element firstly adds 2^m? mod N iff 
kx = I then subtracts N. li b + {2^m'^' mod N) < N, 
subtracting A'^ will result in a negative number. In a bi- 
nary register, this means that the most significant bit 
will be 1. The next circuit element is an inverse QFT 
which takes the addition result out of Fourier space 
and allows the most significant bit to be accessed by 



the following CNOT. The A/5 (Most Significant) qubit 
will now be 1 iff the addition result was negative. If 
b + (2-'m^ mod N) > N, subtracting N will yield the 
positive number (6 + 2^ m^ ) mod N and the MS qubit 
will remain set to 0. 

We now encounter the first circuit element that would 
not be present if interactions between arbitrary pairs of 
qubits were possible. Note that while this "long swap" 
operation technically consists of L regular swap gates, it 
only increases the depth of the circuit by 1. The subse- 
quent QFT enables the MS controlled Fourier addition 
of N yielding the positive number (6 -I- 2^m^ ) mod N if 
A/5 = 1 and leaving the already correct result unchanged 
if A/5 = 0. 

While it might appear that we now done, the qubits 
A/ 5 and kx must be reset so they can be reused. The 
next circuit element subtracts 2^m^ mod N. The result 
will be positive and hence the most significant bit of the 
result equal to iff the very first addition b+ (2^m^ mod 
N) gave a number less than N. This corresponds to 
the MS = 1 case. After another inverse QFT to allow 
the most significant bit of the result to be accessed, the 
A/ 5 qubit is reset by a CNOT gate that flips the target 
qubit iff the control qubit is 0. Note that the long swap 
operation that occurs in the middle of all this to move 
the kx qubit to a more convenient location only increases 
the depth of the circuit by 1. 

After adding back 2^m^ mod N, the next few gates 
form a Toffoli gate that resets kx. The final two swap 
gates move x{i)j^i into position ready for the next mod- 
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FIG. 4: Circuit to compute c = (b + 2-'m^ ) mod A'". The diagonal circuit elements labelled swap represent a series of 2-qubit 
swap gates. Small gates spaced close together represent compound gates. The qubits x{i) are defined in eq. |7|and essentially 
store the current partially calculated value of the modular exponentiation that forms the heart of Shor's algorithm. The MS 
(Most Significant) qubit is used to keep track of the sign of the partially calculated modular addition result. The ki qubit is 
the ith bit of k in eq. |5| The kx qubit is set to 1 if and only if x{i)j = ki = 1. 
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FIG. 5; Circuit designed to interleave two quantum registers. 



ular addition. Note that the L and R gates are inverses 
of one another and hence not required if modular addi- 
tions precede and follow the circuit shown. Only one of 
the final two swap gates contributes to the overall depth 
of the circuit. 

The total gate count of the LNN modular addition 
circuit is 2L^ + 8L + 22 and compares very favourably 
with the general architecture gate count of 2L^ -I-6L + 14. 
Similarly, the LNN depth is 8i + 16 versus the general 
depth of 8L + 13. 



VII. CONTROLLED SWAP 

Performing a controlled swap of two large registers 
is slightly more difficult when only LNN interactions 
are available. The two registers need to be meshed so 
that pairs of equally significant qubits can be controUed- 
swapped. The mesh circuit is shown in fig. This circuit 
element would not be required in a general architecture. 

After the mesh circuit has been applied, the functional 
part of the controlled swap circuit (fig.El) can be applied 
optimally with the control qubit moving from one end of 
the meshed registers to the other. The mesh circuit is 
then applied in reverse to untangle the two registers. 

The gate count and circuit depth of a mesh circuit is 
L{L — l)/2 and L — 1 respectively. The corresponding 
equations for a complete LNN controlled swap are + 
5L and 6L. The general controlled swap only requires 
6L gates and can be implemented in a circuit of depth 
AL + 2. The controlled swap is the only part of this 
implementation of Shor's algorithm that is significantly 
more difficult to implement on an LNN architecture. 



VIII. MODULAR MULTIPLICATION 

The ideas behind the modular multiplication circuit 
of fig. [3 were discussed in section IIIII The first third 
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FIG. 6 

qubits 

potentially swapped states, b.) LNN circuit for the controlled 
swapping of two quantum registers. Note that when chained 
together, the effective depth of the cswap gate is 4. 



comprises a controlled modular multiply (via repeated 
addition) with the result being stored in a temporary 
register. The middle third implements a controlled swap 
of registers. The final third resets the temporary register. 

Note that the main way in which the performance of 
the LNN circuit differs from the ideal general case is due 
to the inclusion of the two mesh circuits. Nearly all of the 
remaining swaps shown in the circuit do not contribute to 
the overall depth. Note that the two swaps drawn within 
the QFT and inverse QFT are intended to indicate the 
appending of a swap gate to the first and last compound 
gates in these circuits respectively. 

The total gate count for the LNN modular multiplica- 
tion circuit is 4L^ + 20L^ + 58L — 2 versus the general 
gate count of + ISL^ + 35L + 4. The LNN depth is 
16L2 + 40L - 7 and the general depth IGL^ + 33L - 6. 



IX. COMPLETE CIRCUIT 

The complete circuit for Shor's algorithm (fig. (S)) can 
best be understood with reference to fig. and the five 
steps described in section^] The last two steps of Shor's 
algorithm are a QFT and measurement of the qubits in- 
volved in the QFT. When a 2-qubit controlled quantum 
gate is followed by measurement of the controlled qubit, 
it is equivalent to measure the control qubit first and then 
apply a classically controlled gate to the target qubit. If 
this is done to every qubit in fig. it can be seen that 
every qubit is decoupled. Furthermore, since the QFT 
is applied to the k register and the k register qubits are 
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FIG. 7: Circuit designed to modularly multiply x{i) by if and only if ki = 1. Note that for simplicity the circuit for L — A 
has been shown. Note that the bottom L + 1 qubits are ancilla and as such start and end in the |<^(0)) state. The swap gates 
within the two QFT structures represent compound gates. 
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never interacted with one another, it is possible to ar- 
range the circuit such that each qubit in the k register 
is sequentially used to control a modular multiplication, 
QFTed, then measured. Even better, after the first quit 
of the k register if manipulated in this manner, it can be 
reset and used as the second qubit of the k register. This 
one qubit trick [s^l forms the basis of fig. |S1 

The total number of gates required in the LNN and 
general cases are + 40L^ + 116^L'^ + A^L - 2 and 
SL" + 26L3 + 70^L^ + 8^L - 1 respectively. The circuit 
depths are 32L^ + 80L^ -\l - 2 and 322.3 _^ qgL^ -2L-1 
respectively. The primary result of this paper is that the 
gate count and depth equations for both architectures 
are identical to first order. 



bour qubit array and studied the number of extra gates 
and consequent increase in circuit depth such a design 
entails. To first order our circuit involves gates ar- 
ranged in a circuit of depth i2L^ on 2L -I- 4 qubits — 
figures identical to that possible when interactions be- 
tween arbitrary pairs of qubits are allowed. Given the 
importance of Shor's algorithm, this result supports the 
widespread experimental study of linear nearest neigh- 
bour architectures. 



X. CONCLUSION 

We have presented a circuit implementing Shor's algo- 
rithm in a manner appropriate for a linear nearest neigh- 



Simulations of the robustness of the circuit when sub- 
jected to noise are in progress. Future simulations will 
investigate the performance of the circuit when protected 
by LNN quantum error correction. 
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FIG. 8: Circuit designed to compute Shor's algorithm. The single-qubit gates interleaved between the modular multiplications 
comprise a QFT that has been decomposed by using measurement gates to remove the need for controlled quantum phase 
rotations. Note that without these single-qubit gates the remaining circuit is simply modular exponentiation. 



